GPCRome-wide analysis of G-protein-coupling diversity using a computational biology approach

GPCRs are master regulators of cell signaling by transducing extracellular stimuli into the cell via selective coupling to intracellular G-proteins. Here we present a computational analysis of the structural determinants of G-protein-coupling repertoire of experimental and predicted 3D GPCR-G-protein complexes. Interface contact analysis recapitulates structural hallmarks associated with G-protein-coupling specificity, including TM5, TM6 and ICLs. We employ interface contacts as fingerprints to cluster Gs vs Gi complexes in an unsupervised fashion, suggesting that interface residues contribute to selective coupling. We experimentally confirm on a promiscuous receptor (CCKAR) that mutations of some of these specificity-determining positions bias the coupling selectivity. Interestingly, Gs-GPCR complexes have more conserved interfaces, while Gi/o proteins adopt a wider number of alternative docking poses, as assessed via structural alignments of representative 3D complexes. Binding energy calculations demonstrate that distinct structural properties of the complexes are associated to higher stability of Gs than Gi/o complexes. AlphaFold2 predictions of experimental binary complexes confirm several of these structural features and allow us to augment the structural coverage of poorly characterized complexes such as G12/13.

insights into the sequence determinants of coupling specificity, for the entire GPCR family 20 as well as for specific subfamilies 21 . The abundance of experimental structures for GPCRs in ligand-dependent, alternative functional states has also shed light on the structural hallmarks controlling receptor activation for class A GPCR 22 as well as across classes 23 .
The determination of receptor-G-protein complex structures is also progressing rapidly, with over 360 complex structures deposited in the PDB (as of March 2023). The determination of the first structures of G i coupled receptor complexes allowed for initial comparisons with G s counterparts and highlighted a role of TM5 and TM6 as selectivity filter [24][25][26][27][28] . As a complement to the release of the MT1-G i complex, we also systematically compared available G s and G i/o complexes with Class A receptors in terms of interface contact networks and G-protein docking mode similarity assessed via structural alignment 29 . The recent determination of four structures of the serotonin receptors (e.g. 5-HT4, 5-HT6 and 5-HT7 with G s , and 5-HT4 with G i1 ) confirmed the role of TM5 and TM6, and in particular their variable length, as a selectivity filter for G-protein binding 30 . The authors also showed via bioinformatics analysis that this macro-switch is conserved among other class A GPCRs 30 . Yet, a comprehensive picture of the structural hallmarks of coupling specificity remains elusive.
In this work we analyze through structural bioinformatics experimental, as well as predicted, GPCR-G-protein 3D complexes to shed further light on the structural basis of coupling specificity through the analysis of interaction interface contact networks, G-protein docking modes and binding energies (Fig. 1A).

Different G-protein complexes are characterized by alternative contact network topologies
We considered a total of 362 3D experimental GPCR-G-protein complexes, comprising 166 G s , 184 G i/o , 9 G q/11 and 3 G 12/13 complexes, corresponding to 93, 17, 10, 3, and 2 unique receptors from Class A, B1, B2, C and Frizzled, respectively, and entailing 9 different G-proteins (i.e. GNAS, GNAI1, GNAI2, GNAI3, GNAT1, GNAO1, GNAQ, GNA11, GNA13) ( Fig. 1B; Supplementary Data 1). To avoid any bias due to redundant structures solved for the same GPCR-G-protein complex, we derived a set of 125 non-redundant 3D complexes by considering representative structures for each receptor-G-protein pair, using resolution and canonical sequence coverage as selection criteria ( Fig. 1C; see Methods). We first identified the residues that are in spatial contact at the GPCR-G-protein interaction interface (see Methods). We then mapped contacting residues to consensus numbering through GPCRdb 31 (Supplementary Data 2) and the common G-protein numbering (CGN) 32 schemes (Supplementary Data 3), respectively for GPCRs and G-proteins. We aggregated contacts based on secondary structure elements (SSEs; Fig. 2A), to yield a network of interacting SSE elements at GPCR-G-protein interfaces ( Fig. 1A and Fig. 2A). For the most abundant coupling groups (i.e. G s and G i/o ), we derived specific SSE contact networks by pooling contacts on the basis of the bound G-protein. SSE contact networks highlight structural signatures specific to each coupling group. Certain SSEs are invariably central within the interface network, such as TM5, ICL2, or ICL3 for GPCRs ( Fig. 2B-D) or H5 for the G-protein (Fig. 2B, C, E). Other elements vary their connectivity based on the bound G-protein. In particular, TM5 has a higher degree of interacting SSEs in G s complexes as well as an overall number of contacts, while G i/o complexes are instead characterized by higher interconnectivity at the ICL1, TM3, TM6, and ICL3 ( Fig. 2B-D). Differences in the overall network topology also emerged when we measured the information flow, quantified as the number of shortest paths passing through each node (i.e. betweenness centrality; see Methods). Indeed, TM5, ICL2, and H8 have a higher betweenness centrality in G s complexes, while TM3, TM6, and ICL1 prevail in G i/o ones (Fig. 2F). Overall, G s contact graphs are significantly different from G i/o ones, as assessed by comparing distances, computed as the Frobenius norm of the difference between the adjacency matrices of the interface contact graphs (permanova P = 1E-06; see Methods).

Contact interface fingerprints imprint coupling specificity
We employed interface contacts to build interaction fingerprints, which are vectors that numerically encode the presence or absence of contact, and which can be used to compare in an unsupervised way GPCR-G-protein complexes based on their interface's structural features (Fig. 1A). We have generated interface fingerprints by mapping either residue pairs at each vector position (Complex fingerprints, or CF), or contact positions separately for the receptor and G-protein (Receptor and G-protein fingerprints, respectively RF and GF; see Methods). We also estimated the contact positions that are more frequent than expected in each coupling group through log-odds ratio statistics (see Methods), and we used this information to filter the most informative contacts for G s and G i/o couplings (Fig. 3). CF clustering identifies two main clusters: the largest one (cluster 1), is enriched with G s complexes from both class A and Class B, as well as some G i/o complexes involving class A, class C and F receptors ( Fig. 3 and Fig. 4A). The second cluster (i.e. cluster 2), is enriched almost exclusively with G i/o complexes ( Fig. 3 and Fig. 4A). In cluster 2, more receptors show promiscuous couplings towards other G-proteins, in particular towards G 12/13 (35% vs 15% in Cluster 1) and G q/11 (55% vs 45% in Cluster 1) (Fig. 4A). The higher promiscuity between G i/o and G 12/13 couplings is also observed when considering all couplings, even with no structure, available from the Universal Coupling Map (UCM 33 ; Supplementary  Fig. 1). When relaxing the criteria to perform CF clustering by removing the log-odds ratio filter and by including all unique complexes, the clustering is now mainly guided by the receptor class, with the largest one entailing Class A complexes and the smallest one all the other classes ( Supplementary Fig. 2). Within each cluster, subclusters enriched in G s or G i/o complexes can be identified. Available G 12/13 and most of the G q/11 complexes cluster within the G i/o -enriched subcluster ( Supplementary Fig. 2). In detail, the S1PR2-GNA13 structure (i.e., PDBID: 7T6B), is clustered within Class A, G i/o subgroup along with other G 12/13 binders, such as LPAR1 and S1PR5. Similarly, ADGRG1-and ADGRL3-GNA13 complexes are found within the Gi/o subcluster of the second, class B enriched, cluster ( Supplementary Fig. 2). This suggests a structural (likely evolutionary imprinted -see Discussion) connection between G i/o , G q/11 and G 12/13 proteins.
Overall, G s couplings are characterized by a significantly higher number of enriched contacts with respect to G i/o ones (Pmann-whitney = 4.69E-14; Fig. 4B). We also performed clustering and enrichment with fingerprints of receptors (RF) and G-proteins (GF) separately. The RF clustering chiefly points to inter-class differences, separating complexes formed by ClassA receptors from those involving other classes ( Supplementary Fig. 3A), while not showing particular contact enrichment differences between G s and G i/o complexes (Pmann-whitney = 4.4E-1; Fig. 4C). The GF clustering better separates these groups ( Supplementary Fig. 3B) and displays marginally significant differences in contact enrichment distributions (Pmann-whitney = 4.4E-2; Fig. 4C). This suggests that the combination between G-protein and receptor's residues provides maximum fine-tuning to the recognition process (Fig. 4B).
Notably, certain GPCR positions hold switch characteristics, in other words, some of the contacts that they mediate are enriched in G s and others in G i/o depending on the partner residues. For example, the contact of 5.65 with G.H5.16 is enriched in G i/o , while the ones with G.H5.25 and G.H5.26 are in G s . Similar patterns are observed for distinct contacts mediated by positions 5.68,5.69 and 5.72 ( Fig. 3 and Fig. 4D, E).

Switching G-protein selectivity through contact interface mutation
To demonstrate the effect on G-protein coupling of the identified contact fingerprints, we employed a multistate-design computational protocol 35 to design receptor mutants specific for a selected G-protein and with reduced affinity for the others, which we then validated through the NanoBit G-protein dissociation assay 14 ( Fig. 5A; Methods). We chose as starting templates for the design the structures of CCKAR, which has been solved in complex with both G s (PDB ID: 7EZK) and G i/o (PDB ID: 7EZH), and we focused the mutagenesis on the receptor positions forming the contact pairs most enriched in G s or G i/o complexes (Supplementary Data 4). We carried out two sets of designs: in one hand we sought to retain G s while removing G i/o couplings, and on the other, we maintained G i/o while reducing binding to G s (Fig. 5A). We found that certain mutations were more recurrent in top-designed sequences ( Fig. 5A; see Methods). In particular, mutations A303 6.25 K, contact network analysis. A A representative 3D complex structure (PDB 6CMO) with GPCR and G-protein SSE labels; B SSE contact network for G s complexes: GPCR and G-protein nodes are colored in green and cyan, respectively. Node diameter is proportional to the total number of contacts mediated by that SSE. Edge thickness is proportional to the number of contacts between connected SSEs and coloring (darker red) is directly proportional to contact conservation; C SSE contact network for G i/o complexes. Network characteristics as in 2 A; D GPCR SSE network node degree distribution for G s and G i/o networks; E G-protein SSE network node degree distribution for G s and G i/o networks; F GPCR SSE network betweenness centrality distribution. Source data are provided as a Source Data file. V311 6.33 H, K375 8.48 R and R376 8.49 L were predicted to reduce G i/o while retaining G s binding, whereas mutations S149 ICL2.53 A, V151 ICL2.55 K, K308 6.30 R, and K375 8.48 P were recurrently predicted to reduce G s while retaining G i/o binding ( Fig. 5A-D). These were subsequently tested through the NanoBiT G-protein dissociation assay using the NanoBiT-G s and NanoBiT-G i1 sensors. Among the eight designed mutations, two (V311 6.33 H and R376 8.49 V) and one (K308 6.30 R) were found to be G s -over-G i biased and G i -over-G s biased, respectively, and the rest of the five had no effect on the G s -vs-G i balance (Fig. 5E-G and Supplementary Fig. 4). We note that expression level of V311 6.33 H was equivalent to that of WT (1:4). These data confirmed the importance of the identified contacts in switching coupling preferences between these two G-proteins. GPCR-G-protein contact interface fingerprint (or CF fingerprint): each row is a GPCR-G-protein contact position (referenced respectively to GPCRdb numbering and G-protein position (CGN) numbering) and each column is a unique receptor complex. If a receptor is complexed with more than one G-protein, its complex fingerprint is reported accordingly. Columns are color-annotated to indicate: G-protein bound in the experimental structure, GPCR class, and experimentally reported coupling (according to UCM, or either GEMTA, Shedding, or GtoPDB). The right-side plot indicates the log-odds ratios (LORs) of the contacts observed at each position. Only contacts present in at least 10% of the structures and having an absolute LOR value greater than 2 are considered. Source data are provided as a Source Data file.

Different repertoires of G proteins docking modes
We assessed the overall 3D similarity of GPCR-G-protein complexes via structural alignment, with a particular focus on the docking mode similarity of the G-protein α-subunits with respect to the receptor. To this end, we first superimposed the Cα-atoms of the most conserved positions within the 7TM bundle (i.e. that are present in all the solved structures) and then calculated the Root Mean Squared Deviation (RMSD) of the Cα-atoms of conserved positions of the Gα subunit ( Fig. 1A; see Methods). The clustering of 3D complexes based on their RMSD shows that G s complexes tend to cluster separately from G i/o ones (Pmann-whitney = 2.5E-14; P-permanova = 1E-6; Fig. 6, Fig. 7A and Supplementary Fig. 5). The largest cluster comprises only Class A receptors, the vast majority bound to G i/o proteins, with the only exception of a few G s complexes (i.e. MC2R, GALR2), as well as the BDKRB2-GNAQ and S1PR2-GNA13 complexes (Fig. 6). The second largest cluster involves the vast majority of G s complexes, including Class A and the totality of class B receptors. This cluster also comprises several G q/11 (i.e. CCKAR, CCKBR, CRHR2, HRH1), G i/o (i.e. OPRK1, OPRD1, OPRL1, OPRM1, CX3CR1, ADGRF1 and ADGRG3) as well as two G 12/13 complexes (i.e. ADGRG1 and ADGRL3). Finally, a third, smaller outgroup cluster contains complexes involving class A, C, and F receptors most deviating from the other structures. Also in this case, the receptors in the largest, G i/o enriched cluster show a more promiscuous tendency, with G q/11 and G 12/13 as most recurrent secondary couplings and G s as the least recurrent one ( Fig. 6 and Fig. 7A). When considering only class A receptors, the RMSD distributions are no longer significantly different (Pmann-whitney = 6.7E-1; Supplementary  Fig. 6), although characterized by significantly different centroids (Ppermanova = 1E-6). We have also estimated residue level deviations of the Gα subunits of the fitted complexes by calculating Root Mean Square Fluctuations (RMSF; see Methods) and compared the profiles obtained for G s and G i/o complexes, which highlighted significantly higher fluctuations for G i/o complexes with respect to G s ones (Wilcoxon P = 1.18E-13), for the whole Ras GTPase domain and in particular for regions such as H2, H3 as well as the C-terminal lobe of the Ras domain (Fig. 7C). Overall, G s complexes display less variability of the terminals, with H5 appearing more conformationally restrained and bent towards TM3 and ICL2, while G i/o complexes display greater conformational variability for αN and H5 (Fig. 7D left). A comparison of representative structures of the four coupling groups show slight differences in the docking mode of each representative, which are nevertheless smaller for G i/o , G q , and G 12/13 (Fig. 7E).
We also explored the potential conformational bias of the nanobodies (i.e. Fab16 or Nb35) used to stabilize the bound G-protein on the observed G-protein docking modes. First, we annotated the presence/ absence of the nanobody for each complex subjected to RMSD clustering. We observed no correlation between RMSD clusters and the presence or absence of nanobodies ( Fig. 6 and Supplementary Fig. 5). Second, we relaxed the GPCR-heterotrimeric G-protein complex without such nanobodies, using state-of-the-art methods for structural refinement (Rosetta relax; see Methods). Both RMSD and RMSF analysis performed on relaxed structures showed even larger statistically significant differences between G s and G i/o complexes (Pmann-whitney = 1.1-36; Supplementary Fig. 7). Notably, the differences of the G s and G i/o RMSD distributions is still significant when considering only class A receptors (Pmann-whitney = 6.5E-6; Supplementary Fig. 8).

Different energies characterize specific GPCR-G-protein interfaces
We exploited the relaxed GPCR-heterotrimeric G-protein complexes to further characterize the binding interface energy of the complex using Rosetta InterfaceAnalyzer 36 (see Methods). By considering all available GPCR Gα-subunit pairs with an experimentally resolved complex, we showed that the ΔG of binding of G s complexes is significantly lower than G i/o complexes (Pmann-whitney = 7.2E-6; Fig. 8A) and it partially correlates with the slightly higher ΔSASA observed for G s complexes compared to G i/o ones (Pmann-whitney = 1.4E-3; Fig. 8B). When considering class A receptors only, the difference in binding energy distribution between G s and G i/o complexes is even larger (Pmannwhitney = 3.4E-6; Supplementary Fig. 9). Intriguingly, we observed that G s is bound less strongly to class B1 than class A receptors (Pmannwhitney = 2.3E-3; Fig. 8C), suggesting that receptors from different classes might bind to the same G-protein with different affinities due to different structural and functional requirements. On the other hand, the same receptor always binds with higher affinity to G s than G i/o . Indeed, we have compared the binding energies of complexes of the same receptor (e.g. GCGR, CCKAR, and HTR4) with both G s and G i/o proteins. Notably, the ΔG of binding for G s is always lower and is characterized by higher ΔSASA compared to G i/o irrespective of the slight docking modes variations observed in G s complex structures of the same receptor ( Fig. 8D-F).

AlphaFold2 predictions extend our understanding of the structural basis of coupling diversity
To further our understanding of the structural basis of GPCR-G-protein recognition, we predicted through AlphaFold-multimer (v2.3) 37 996 GPCR -G-protein alpha subunit pairs from UCM binary interactions (see Methods). We first benchmarked AF-multimer by assessing the prediction performances with or without 3D templates for 125 representative GPCR-G-protein complexes with available experimental structures. Using 3D templates in the prediction slightly improves the performances, assessed by measuring the deviation of predicted vs. experimental interfaces via either the DockQ metric (Wilcoxon P = 0.021; Fig. 9A) or the fraction of native contacts (Wilcoxon P = 0.004; Fig. 9B). It must be emphasized that even without templates multimer's predictions achieve almost comparable performances, yielding an average DockQ score of 0.645 vs. 0.664 obtained for predictions with templates (Fig. 9A). We, therefore, ran multimer predictions with available templates for receptor-heterotrimeric G-protein complexes whose receptor-α subunit binary interactions are reported in the UCM dataset. Since some predicted complexes showed unrealistic docking topology, we created a composite filter based on structural, topological, and statistical scoring metrics (e.g. pDockQ) to remove predicted structures with unrealistic complex topologies as well as regions predicted with low confidence (based on pLDDT; see Methods). Such a filtering scheme improves the correlation between prediction scores (pDockQ) and distance from experimental structures (DockQ) in the benchmark (Fig. 9C, D). We applied this filter to the multimer's predictions, yielding a set of 825 complexes for downstream processing (Supplementary Data 5). Contact analysis performed on predicted complexes revealed patterns similar to those observed experimentally, in particular, the models recapitulated most of the experimental contacts of G s and G i/o complexes (Fig. 9E). As expected, the agreement of contact patterns from experimental and predicted G q/11 and G 12/13 complexes is lower due to fewer structural templates available for this coupling groups (Fig. 9E). Notably, contact heatmaps derived from each G-protein group display highly specific patterns, which could potentially illuminate signaling mechanisms for poorly studied G-proteins such as G 12/13 . For instance, contact frequency heatmaps for G 12/13 complexes display peculiar patterns mediated by TM5, ICL3, and TM6 with G.H5. Several of the contacts observed for other G-protein complexes at positions G.H5.25 and G.H5.26 are missing (Fig. 9E). On the other hand, contacts mediated by positions G.H5.23 and G.H5.24 with 6.33, as well as 6.29, appear as highly specific for G 12/13 . We also confirmed on predicted structures the differences in binding energies observed between G s and G i/o complexes (Pmann-whitney = 9.8E-4; Fig. 9F). We also found that G q/11 complexes are as stable as G s , while G 12/13 ones are less affine, similarly to G i/o (Pmann-whitney = 9.8E-4; Fig. 9F). Such differences in binding energies are anti-correlated with differences in Interface Area (Fig. 9H). Additionally, we found interclass significant differences for G s PCRs, with ClassA being the most and ClassC the least stable (Pmann-whitney = 5.8E-3; Fig. 9H).

Discussion
In the present study, we have performed a computational, comparative analysis of 3D GPCR-G-protein complexes in their nucleotide-free state, to identify structural hallmarks of the interaction interfaces that might be linked to coupling specificity.
Complexes involving different G-proteins are characterized by distinctive structural signatures, like contact networks displaying a different engagement of secondary structural elements such as TM5, TM6, and ICLs. This notion is in line with earlier comparative analysis highlighting a selectivity filter operated by TM5 and TM6 (commented on 28 ). More recently, structural determination of serotonin receptors in complex with either G s or G i proteins, accompanied by bioinformatics analysis of a representative set of class A complexes, supported the role of these secondary structure elements by highlighting a macro-switch, operated by TM5 and TM6 terminals' variable length, which dictates selectivity towards G s vs G i/o 30 . Our unsupervised analysis of interface contacts entails complementary interactions between key positions on both receptor's and G-protein's sides. G s complexes are characterized by significantly higher fractions of enriched contacts, that are mainly imposed by contacting positions on the Error bars represent SEM. Statistical analysis was performed using the two-way ANOVA with Sidak's correction for multiple comparisons (A303 6.25 K, V311 6 = 3). F, G G s -and G i -coupling activity of the CCKAR mutants. HEK293 cells transiently expressing the indicated CCKAR construct along with the NanoBiT-G s or the NanoBiT-G i1 sensor were subjected to the NanoBiT-G-protein-dissociation assay using CCK-8 as a ligand. The ligand-response parameters E max (F) and ΔpEC 50 (G), which were normalized to WT (1:1), were used to denote the G-protein-coupling activity. Data are from 3-5 independent experiments with each dot representing an individual experiment. Error bars represent SEM. Statistical analysis was performed using the multiple paired, two-sided t-test between G s and G i1 (n s , P > 0.05; * P < 0.05; *** P < 0.001). G-protein's side. However, the combination of GPCR and G-protein residues leads to maximum discrimination between the G s and G i/o groups. These results are reminiscent of the G-protein barcode model for coupling specificity, which emerges by the presentation of an evolutionary more rigid G-protein barcode to a more flexible receptor counterpart 20 . At the same time, G s bound receptors are characterized by a less promiscuous binding, suggesting that the structural requirements for G s specific binding overall impose discrimination for secondary couplings.
Certain SSEs are exclusively characterized by contact enrichments for specific G-protein (e.g. TM2, TM3 or H8 for G i/o ). Notably, the DRY motif-mediated contacts, particularly D 3.49 's ones, can be considered one of the main structural hallmarks of G i/o vs G s complexes. The latter is indeed characterized by a greater bending of G-protein's H5 towards TM3's C-term and ICL2, which function as hinges to concomitantly detach the G-protein C-term from the DRY motif. Other regions are characterized by a switch-like character, meaning that certain positions form contacts enriched in G i/o and others in G s , including: ICL2, which encompasses the known ICL2.51 specificity determinant position 38 , TM5, ICL3 and TM6. We noted that the switching behavior might pertain also to individual positions and depend on the specific contact partner. These residue-level features likely work in coordination with the macro-switch described for the serotonin receptors 30 . We leveraged our findings to implement a computational multi-state design procedure to predict mutations biasing a given coupling, starting from a promiscuous WT sequence (e.g. CCKAR), which we experimentally verified via NanoBiT G-protein dissociation assays. Experiments identified two mutations (V311 6.33 H and R376 8.49 V) to bias G s and one (K308 6.30 R) to bias G i signaling, confirming the importance of the identified contacts in switching coupling preferences between these two G-proteins.
Through our comprehensive analysis, we show that the few G q/11 and G 13 complexes available display certain structural characteristics similar to G i/o complexes. These correlate with recent phylogenetic analysis showing that G i and G q family members share a common ancestor, and that G 12/13 's ancestor is likely a retro-gene originated by retroposition from a pre-G q gene 39 .
The usage of a state-of-art AI model (i.e. AlphaFold-multimer 37 ) for structural prediction also allowed us to expand the structural repertoire of GPCR-G-protein complexes. This is particularly valuable for poorly characterized groups, such as G 12/13 ones. Indeed, we predicted peculiar contact patterns at the TM5 and ICL3 that are characteristic of this group and might suggest unique structural requirements. Indeed, the critical importance of these regions also emerged in our previous effort to engineer a G 12 -DREADD, which was achieved by swapping shorter ICL3 loops from GPR183 or GPR132 on hM3D and experimentally validated to be functional 14 .
The observed structural differences are linked with different predicted binding affinities for the distinct coupling group complexes, with G s complexes being more stable than G i/o ones. Slight differences in interface contacts and docking mode might reflect the difference in binding affinities for G s observed for class A vs. class B receptors. We speculate that the higher binding energy of class B receptor, suggestive of less stable binding to G s , might be related to slower rates for G-protein activation observed for a representative class B receptor (GCGR) compared to class A one (ADRB2) 34 . On the other hand, a comparison of the binding energies of G s and G i/o complexes also revealed significant differences, which holds true even when considering the complexes with the same receptor, further stressing the major contribution from the G-protein residues in priming the binding. Such differences in binding energies are also confirmed in AF-multimer predicted models, which also allow us to characterize the members of other families such as G q/11 and G 12/13 . We speculate that G s has a higher energetic barrier for activation due to its peculiar biological role of AC activation, which should be tightly regulated. The conformational restriction in G s complexes might contribute to spatio-temporally finetune AC activation. On the other hand, looser G i/o binding to cognate receptors could explain their reported faster nucleotide turnover 40 . Moreover, lower structural conservation and less affine G i/o complexes are likely connected to the success of this coupling 14,33 , which is instrumental in providing a redundant mechanism for AC inhibition. Structural features and energetics are certainly not the only factors governing the evolutionary success of a certain coupling: indeed G 12/13 , which form complexes predicted as weak as the G i/o ones, are the least successful couplings. Higher-order spatio-temporal dynamics, such as the more recent evolution of G 12/13 39 , might explain these patterns.
A limitation of this study is that most of the structures considered are in the nucleotide-free state. Indeed, previous studies have suggested that intermediate states other than the nucleotide-free complexes influence G-protein-coupling selectivity 41 . Nucleotidedecoupled G proteins mutants, which bypass the intermediate-state complexes, are characterized by a degradation of coupling selectivity, indirectly highlighting the importance of the conformational dynamics of these intermediate-state complexes in guiding selectivity 42 . Future structural studies systematically targeting the intermediate states of GPCR-G protein complexes will improve the understanding of coupling selectivity.
The greater availability of experimental complex structures and increasingly accurate predicted models, including context-aware and diffusion-based deep learning techniques to predict alternative conformational states, will allow us to better understand the structural basis of G-protein-coupling specificity in the future. This knowledge will be key to design better-biased drugs, able to modulate only certain transducers, as well as it will be leveraged to improve the design of novel chemogenetic probes such as DREADDs.

Data sets
We used the mapping between PDB 43   family) to identify GPCRs. We found 376 structures that met this criterion. If more than one GPCR or Gα chain were in the structure, we considered the pair of chains with the highest number of contacts between them (Supplementary Data 1).
We considered as contacts all the pairs of amino acids with less than 8 Å between their Cβ (Cɑ for glycine -see below), following standard practices employed for contact analysis in structural predictions 46 . Only the structures in which all the contacts between the GPCR and Gα chains were mapped to the same pair of Uniprot accessions (according to SIFTS xml residue level mappings) were kept for further analysis. We excluded the remaining structures that were considered as chimeric. In this way, we obtained 362 structures.
Whenever we found more than one structure representing the same GPCR-G-protein pair, we considered the one with the highest Article https://doi.org/10.1038/s41467-023-40045-y resolution to be the representative for each complex. In case that more than one structure had the same resolution, we chose the structure which covered the largest portion of the corresponding G-protein according to SIFTS. We found 125 GPCR-Gprotein pairs with at least one solved structure (Supplementary Data 1).

GPCR-Gα complexes prediction via AlphaFold-Multimer
We used Alphafold-Multimer v2.3.13732 to generate 3D models of GPCR-heterotrimeric G-protein complexes. The human canonical protein sequences of GNB1 and GNG1 were used as the β and γ subunits, respectively. We trimmed the N-terminal part of the GPCRs up to 50 residues before the start of TM1, to avoid folding the long extracellular portion of some receptors. The databases required to run AlphaFold-Multimer were downloaded on 12 January 2023. Among the 5 models generated for each GPCR-G-protein complex, only the best one was considered for further analysis. The score used to evaluate the models was the default one used by AlphaFold-Multimer (0.2*pTM + 0.8*ipTM).
To assess the reliability of AF-multimer models, we predicted the structure of the 125 GPCR-G-protein complexes with available experimental structures. We used the human sequences of the corresponding GPCR-Gα pairs in all predictions. We generated 3D models "using templates", that is allowing AlphaFold-Multimer to use the 3D structural models available in the PDB (they are used only to predict the structure of single chains), and "without templates", running the predictions setting --max_template_date=01-01-1900 instead, to avoid the usage of any available experimental template.
A total of 996 GPCR-Gɑ pairs, reported to bind in UCM 33 , were considered, respectively corresponding to 164 and 13 human GPCRs and Gα proteins (the three members of the transducin family were not considered). We removed 21 predicted complexes (975 structures left) based on unrealistic complex topologies, i.e. not having any of the following G-protein's H5 positions, i.e. H5.16, H5.19, H5.20, H5.23, H5.24, H5.25, that we found to most recurrently mediate contacts with GPCR's residues in experimental structures. To further remove low-quality models produced by Alphafold, we used the pDockQ metric 47 , which is fine-tuned on predicting the DockQ of the predicted complex with respect to experimental complexes. We kept only the structures with pDockQ ≥ 0.23 48 (118 out of 975 structures removed) 36 .
We then employed Alphafold's confidence score (predicted Local Distance Difference test -pLDDT), to remove protein terminals predicted with low confidence which might lead to artefactual contacts. Hence, we removed all the residues with pLDDT <70. We then performed the interface analysis on these trimmed sequences using the same procedure used for experimental complexes. As a last filter to remove low-quality structures, we used the output of Rosetta InterfaceAnalyzer 36 (https://www.rosettacommons.org/docs/latest/ application_documentation/analysis/interfacE-analyzer; see below paragraph "Analysis of GPCR-Gα binding energy with Rosetta") and we kept only models with ΔSASA ≥ 1500 Å 2 , which is slightly less than the minimum ΔSASA found in all experimental structures. In this way we removed 32 more structures, leading to a total of 825 high-quality complex structures for downstream analysis (Supplementary Data 5).

Contact analysis
We considered residue-residue contacts mediating GPCR-G-protein interfaces as those having the Cβ spatially closer than 8 Å (Cɑ for glycine) (as in 46 ). We analyzed 362 solved PDB GPCRs-G-protein complexes, which according to G protein family classification comprised 184 G i/o , 166 G s , 9 G q/11 , and 3 G 12/13 structures. We mapped interface PDB residues to Uniprot canonical sequences residues by using SIFTS residue level mappings from individual PDB XML files. The interface sequence positions were then mapped to GPCRdb numbering 31 (Supplementary Data 2) and to the Common G-protein Numbering (CGN) schemes 32 (Supplementary Data 3). We aggregated contacts of different structures referred to the same GPCR-G-protein complex and considered equivalent residues pairs from different structures only once to avoid redundancy. We compared G i/o and G s complexes by creating a consensus list of GPCR and G-protein positions found in contact with at least one of the members of the two groups. For each GPCR-G-protein consensus positions, we calculated the fraction of GPCRs displaying a contact in each G-protein family group (either G i/o or G s ) over the total number of GPCRs in contact. Such a fraction reflects the conservation of contact in each G-protein group. We constructed an interface contact network by considering the consensus list of GPCR-G-protein contacts from all complexes. Contacting positions were projected to the secondary structure elements of both GPCRs and G-proteins, represented as nodes of the networks. Node diameter is proportional to the total number of contacts mediated by the position of that secondary structure element. Edge width is proportional to the number of unique GPCRs mediating contact between the two linked SSEs. Edge coloring (bright to dark red) is proportional to the average of the contact fraction of individual position pairs, and it reflects the overall contact conservation between two SSEs. Dashed lines indicate contact not formed in each G-protein class but present in the other. We calculated network statistics such as node degree and centrality betweenness distribution through Cytoscape (https:// cytoscape.org/) 49 and customized Python scripts. All the analyzes have been done through customized Python scripts, using biopython libraries (version 1.78). Network drawings have been generated through Cytoscape.
To analyze the statistical significance of the difference between G s and G i/o contact fingerprints, we generated an inter-chain contact graph for each structure. We then used the Frobenius norm of the difference between the adjacency matrices of the interchain contact Fig. 9 | AlphaFold-multimer prediction of experimental GPCR-G protein complexes. Evaluation of AlphaFold models with a resolved experimental structure. A DockQ and B fraction of retrieved native contacts for the models generated with and without the usage of available 3D experimental templates of the monomer structures. N = 125 structural complex models computed for both conditions. The p-values have been computed with a two-sided Wilcoxon signed-rank test. Scatterplot of the DockQ of AlphaFold models computed using structural templates as a function of C the default score and D the pDockQ. Translucent bars represent 95% confidence intervals estimated with bootstrap. Filtered models contain only the residues with pLDDT ≥ 70. Binding energy of the filtered AlphaFold models estimated through Rosetta; E Alphafold complexes contact frequency heatmap for G s (red scale), G i/o (blue scale), G q/11 (green scale) and G 12/13 (purple scale), each corresponding to a differently colored triangle: columns are GPCR positions in GPCRdb numbers, rows are G-protein positions in CGN numbers. Only contacts with frequency > 20% (over the number of unique complexes) are considered. The color intensity is proportional to the contact frequency. Experimental contacts are marked with a black asterisk; for E) all GPCR classes and F) only class A receptors. The analysis considers n = 87 Gs, n = 265 Gq/11, n = 430 Gi/o, and n = 43 G12/13 structural complex models; H ΔG binding (REU) of complexes with class A, class B1, and class C receptors. The p-values have been computed with a two-sided Mann-Whitney U test with Bonferroni correction. ns P > 0.05; * P < 0.05; ** P < 0.01; *** P < 0.001; **** P < 0.0001. Boxplots show the median as the center and the first and third quartiles as the bounds of the box; the whiskers extend to the last data point within 1.5 times the interquartile range (IQR) from the box's boundaries. Source data are provided as a Source Data file.
graphs as a distance metric for the structures. The resulting distance matrix was used to perform a PERMANOVA test 50 to determine if the graphs generated by interaction with the G s and the G i/o family were significantly different from each other. This test compares the difference in docking modes in the same G-protein family to the difference in docking modes between different G-protein families. To evaluate the difference in variance between the distribution of intra-family pairwise distances, we used the PERMDISP test. Both tests were performed using the scikit-bio library (version 0.5.4) in python (https://cran.r-project. org/web/packages/vegan/index.html). We performed 10 6 and 10 4 permutations for PERMANOVA and PERMDISP, respectively.
Analysis was also done on Alphafold predicted complexes considering 430 G i/o , 265 G q/11 , 87 G s and 43 G 12/13 structures according to UCM. Networks have been created for all position pairs as well as for cases where >0.2 complexes have the position pairs.

Fingerprint analysis
We have generated interface fingerprints by creating position vectors by mapping the contacts (1 if present, 0 otherwise) of either residue pairs (Complex fingerprints, or CF), or the individual positions separately for the receptor and G-protein (Receptor and G-protein fingerprints, respectively RF and GF). We performed unsupervised, hierarchical clustering on rows (unique GPCR-G-protein complex) using the "Ward" method and Euclidean distance as metric, employing the clustermap function from seaborn library (version 0.11.1). Rows are color annotated to indicate: G-protein bound in the experimental structure, GPCR class, experimentally reported couplings (according to UCM). The top plot indicates the enrichment of the contacts observed at each position.
For each consensus position of GPCR and Gprotein we calculated the log-odds ratio (LOR) from contingency Table 1 using the following equation: CC and CN terms represent the number of GPCRs coupled to a specific Gprotein group (G i/o or G s ) that are or are not, respectively, in contact at that position (either individual GPCR or G-protein positions or residue pairs). NC and NN terms represent the number of noncoupled GPCRs for a specific G-protein, that are or are not in contact, respectively, at a given position (either individual GPCR or Gprotein positions or residue pairs). Contacts contributed from the loops, N-termini and C-termini of the GPCR where aggregated. We calculated the binning statistics of the log-odds ratio of contacts.

Rosetta multistate design predictions of CCKAR coupling switching mutations
To bias G s over G i/o coupling, or vice versa, we have designed mutations through the Multi-state design by Rosetta 35,51 , as availale in the RosettaCommon software suite (version 2021.16.61629). As a starting template for the design, we have chosen CCKAR (UniProt Accession: P32238), a cholecystokinin receptor that has been solved in complex with G s (PDB ID: 7EZK), G i/o (PDB ID: 7EZH") and G q/11 (PDB ID: 7EZM) 52 . The CCKAR positions selected for design were either involved in specific contacts, based on comparative contacts analysis of G s (7EZK) and G i/o (7EZH) experimental complexes, or those with the highest log-odds ratios from GPCRomE-wide contact pair statistics. To switch G s coupling, we chose to mutate the following positions: 3.54, ICL2.51, ICL2.52, ICL2.53, ICL2.55, ICL3(299), 6.26, 6.30, 6.32, 6.33, 6.36, 7.56, 8.47, 8.48, and 8.51. To switch G i/o coupling we chose the following positions: 2.39, 3.49, 3.50, 3.53, ICL2. 51, 6.25, 6.26, 6.29, 6.33, 8.48, 8.49, and 8.50. Next, we designated positive state and negative state complexes (7ezk or 7ezh). The positive state being the one whose structure and binding energy are preserved by mutations and the negative state being the one which is destabilized by mutations. We used a custom fitness function to describe the energy states of binding interfaces of both complexes with 12 different binding weights (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) and getting for each of them the best fitted structure with a different sequence of mutations for destabilizing either G s or G i/o coupling. We calculated the binding energy of the interface with InterfaceAnalyzer (https://www.rosettacommons.org/ docs/latest/application_documentation/analysis/interfacE-analyzer) and redocked the structures with RosettaDock using local refinement of the interface between GPCR and G-protein structures 53 . We chose the best redocked structures based on the Total score of the redocking tool and calculated the interface binding energy (REU) with InterfaceAnalyzer.
To redock the mutated GPCR-G-protein complexes we used the Rosetta docking protocol with a docking:docking_local_refine flag to refine GPCR-Gα interface. We created top 20 redocked structures for each of the complexes created by MultiState design binding weights (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12) and chose the best redocked structure for each of them based on the Total score provided by the protocol 53 . Then we ran Rosetta InterfaceAnalyzer for the best redocked structures and assess the binding interface energy.

NanoBiT-G-protein-dissociation assay for designed CCKAR mutants
Ligand-induced G-protein dissociation was measured by the NanoBiT-G-protein dissociation assay 14 , in which the interaction between a Gα subunit and a Gβγ subunit was monitored by the NanoBiT system (Promega). Specifically, a NanoBiT-G-protein consisting of the Gα subunit fused with a large fragment (LgBiT) at the α-helical domain (Gα-LgBiT) and an N-terminally small fragment (SmBiT)-fused Gγ 2 subunit with a C68S mutation (SmBiT-Gγ 2 -CS) was expressed along with untagged Gβ 1 subunit and a test CCKAR construct. The full-length human CCKAR was inserted into the pCAGGS expression vector with an N-terminal fusion of the hemagglutinin-derived signal sequence (ssHA), FLAG epitope tag and a flexible linker (MKTIIALSYIFCLVFA-DYKDDDDKGGSGGGGSGGSSSGGG; the FLAG epitope tag is underlined). The resulting construct was named as ssHA-FLAG-CCKAR. HEK293A cells (Thermo Fisher Scientific, cat no. R70507) were seeded in a 6-well culture plate at a concentration of 2 × 105 cells ml-1 (2 ml per well in DMEM (Nissui) supplemented with 5% fetal bovine serum (Gibco), glutamine, penicillin and streptomycin), one day before transfection. Transfection solution was prepared by combining 5 µL (per dish hereafter) of polyethylenimine (PEI) Max solution (1 mg ml -1 ; Polysciences), 200 µL of Opti-MEM (Thermo Fisher Scientific) and a plasmid mixture consisting of 200 ng ssHA-FLAG-CCKAR (or an empty plasmid for mock transfection), 100 ng Gα-LgBiT subunit (Gα s -LgBiT or Gα i1 -LgBiT), 500 ng Gβ 1 subunit and 500 ng SmBiT-Gγ 2 -CS subunit. For Gα s -LgBiT, to enhance expression of the NanoBiT-G s sensor, 100 ng RIC8B plasmid was co-transfected. After incubation for 1 day, the transfected cells were harvested with 0.5 mM EDTA-containing Dulbecco's PBS, centrifuged and suspended in 2 ml of HBSS containing 0.01 % bovine serum albumin (BSA; fatty acid-free grade; SERVA) and 5 mM HEPES (pH 7.4) (assay buffer). The cell suspension was dispensed in a white 96-well plate at a volume of 80 µL per well and loaded with 20 µL of 50 µM coelenterazine (Angene) diluted in the assay buffer. After a 2 h incubation at room temperature, the plate was measured for baseline luminescence (SpectraMax L, Molecular Devices) and titrated Article concentrations of sulfated CCK-octapeitide (Peptide Institute, cat no. 4100-v; 20 µL; 6X of final concentrations) were manually added. The plate was immediately read for the second measurement as a kinetics mode and luminescence counts recorded from 5 min to 10 min after compound addition were averaged and normalized to the initial counts. The fold-change values were further normalized to those of vesicle-treated samples and used to plot the G-protein dissociation response. Using the Prism 9 software (GraphPad Prism), the G-protein dissociation signals were fitted to a four-parameter sigmoidal concentration-response curve with a constrain of the HillSlope to absolute values less than 2. For each replicate experiment, the parameter Span (= Top -Bottom) of the individual CCKAR mutants were normalized to those of WT CCKAR performed in parallel and the resulting Emax values were used to calculate ligand response activity of the mutants.

Flow cytometry
Plasmid transfection for the ssHA-FLAG-CCKAR and the NanoBiT-Gi sensor was performed according to the same procedure as described in the "NanoBiT-G-protein-dissociation assay". One day after transfection, the cells were collected by adding Typically, we obtained an MFI value of 2000 (arbitrary unit) for WT CCKAR and 20 for the mock transfection. For each experiment, we normalized an MFI value of the mutants by that of WT performed in parallel and denoted relative levels.

Clustering of G-proteins complex conformations
We compared G i/o and G s complexes by performing Root Mean Square Deviation (RMSD)-based clustering. To calculate RMSD, we created a list of consensus positions based on all the sequences of GPCR and all G-protein in 362 complexes, by first mapping PDB residues to Uniprot canonical sequences via SIFTS 45 and then to GPCRdb consensus numbers 31 . We considered 141 GPCR and 73 G-protein consensus positions defining respectively the consensus core of the 7TM domain and the Ras GTPase domain solved in all experimental structures. The complexes were fitted using the Cα atoms of the GPCR core, while the Cα of the G-protein core were used to calculate the RMSD after superimposition. Calculations were performed using the Superimposer function of the PDB Biopython module 54 (version 1.78) through customized scripts. We performed hierarchical clustering on RMSD using the Ward method with Euclidean distance as metrics, using the clustermap function from seaborn library (version 0.11.1). We compared the distribution of the RMSD calculated among complexes of the G i/o and G s groups using a Wilcoxon rank-sum test. Results were displayed through matplotlib (https://matplotlib.org/) and seaborn (https://seaborn.pydata.org/) libraries using customized python scripts. We also calculated the root mean squared fluctuations of the G-protein consensus positions using the following equation: where x i is the coordinate of particle i and x 0 i is the coordinate of particle i in the reference structure ', which is the complex with the least RMSD deviation from the other complexes (i.e. centroid) in the G s (PDB: 8E3X) and G i/o (PDB: 8F7S) groups.
We compared G s and G i/o groups RMSFs by performing a Wilcoxon test and we plotted each position and its standard deviation.

Analysis of GPCR-Gα subunit binding energy with Rosetta
To analyze the GPCR-Gα interface in a 3D structural model, we first relaxed the structure using the Rosetta relax application 55 , using backbone constraints. Then we ran Rosetta InterfaceAnalyzer 36 (https:// www.rosettacommons.org/docs/latest/application_documentation/ analysis/interface-analyzer), from RosettaCommon software suite (version 2021. 16.61629), specifying the chains of the GPCR and the Gα which are interacting in the complex. This protocol takes a multichain complex as input and computes a new structure in which the two chains of interest are separated. The interface energy and the ΔSASA are calculated as the difference in energy and SASA in the bound and unbound structure. We run InterfaceAnalyzer with the "-pack_input" and "-pack_separated" flags to optimize the side chain configuration before and after separating the chains. If a nanobody was present in a structure, we removed it before the relaxation step, to limit its influence on the analysis. The interface energy is computed according to the Rosetta energy function, which includes physics-based terms that represent electrostatic and van der Waals' interactions, as well as statistical terms representing the probability of finding the torsion angles in the Ramachandran plots. This score is indicated in Rosetta Energy Units (REU) and cannot be converted into the actual binding energy, but it gives a reasonable estimation of the stability of the complex 56 .

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
Source data are provided with this paper. Data generated for this study are available at https://github.com/raimondilab/GPCR_ structure_analysis and https://doi.org/10.5281/zenodo.8067369. GPCR-Heterotrimeric G protein complexes predicted with AFmultimer are also available via Precogx webserver (https://precogx. bioinfolab.sns.it/). The raw data used in this study are available in the Zenodo database under the accession code https://doi.org/10.5281/ zenodo.8063796. The PDB accession codes used in this study can be found in Supplementary Data 1, corresponding to a total of 362 experimental structures analyzed. Source data are provided with this paper.